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In order to pursue the issue of the relation between the financial cross- 
correlations and the conventional Random Matrix Theory we analyse sev- 
eral characteristics of the stock market correlation matrices like the dis- 
tribution of eigenvalues, the cross-correlations among signs of the returns, 
the volatility cross-correlations, and the multifractal characteristics of the 
principal values. The results indicate that the stock market dynamics is not 
simply decomposable into 'market', 'sectors', and the Wishart random bulk. 
This clearly is seen when the time series used to construct the correlation 
matrices are sufficiently long and thus the measurement noise suppressed. 
Instead, a hierarchically convoluted and highly nonlinear organization of 
the market emerges and indicates that the relevant information about the 
whole market is encoded already in its constituents. 

PACS numbers: 89.20.-a, 89.65. Gh, 89.75.-k 

1. Introduction 

The financial markets represent probably the most complex structure 
that is associated with the contemporary civilization. They involve ex- 
tremely many constituents, many different space and time scales and an 
uncountable number of convoluted factors that drive the financial dynamics 
towards a real complexity. Its most relevant feature is a permanent compet- 
itive coexistence of collectivity and noise. The related quantitative charac- 
teristics can be studied using multivariate ensembles of parameters that rep- 
resent dynamics of various financial assets. Due to this multi-dimensionality 
the most natural and efficient formal frame to quantify the whole variety of 
effects connected with complexity is in terms of matrices [1] . Since the dy- 
namics of complexity is inherently embedded in noise, the Random Matrix 
Theory (RMT) [2, 3j offers an appropriate reference. Deviations from RMT 
help to detect real information and to potentially extract it from what is 
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universal in the RMT sense and thus practically not very informative. An 
extremely useful matrix approach to the financial dynamics is based on us- 
ing the correlation matrices formed from (i) the time series representing 
the price changes of a certain basket of different assets over the same pe- 
riod of time or (ii) from the time series representing different disconnected 
periods of time (days or weeks) for either a single asset or an index. The 
simplest commonly used variant of RMT to serve as a null hypothesis in 
these cases corresponds to the ensemble of Wishart matrices [3] . The result- 
ing eigenvalue distribution p{X) is then described by the Marchenko-Pastur 
formula O |6] which confines this distribution within the bounds 



Here Q = T/N where is the number of time series of length T. Relating 
eigenspectra of the empirical N ® N stock market correlation matrices to 
this formula shows that typically only a few of the eigenvalues, represent- 
ing a global or some more local collective moves within the market, are 
located sizeably above Amax while the bulk of the empirical eigenvalue dis- 
tribution satisfactorily falls within the lower and the upper bound. This 
is interpreted as an indication that eigenvectors associated with the bulk 
are undistinguishable from noise and thus carry no information. Such a 
situation is quite convincing in the case denoted above as (ii) [3 [8]. The 
results of the original study of cross-correlations among the stock market 
companies were interpreted analogously [9l |10] . A more systematic analysis 
of this kind of correlations (case (i) above) shows however [11] that they 
are much more subtle, the overlap of the bulk with the bounds prescribed 
by RMT dissolves as T increases and eigenvectors even from the middle of 
the spectrum carry significant information. Below we recapitulate the most 
relevant results and provide some further arguments in favor of the state- 
ment that there is nontrivial information encoded also in the bulk of the 
eigenvalue spectrum of the stock market correlation matrix (see also |12j). 
These results should be taken care of also in the context of the Markowitz 
optimal portfolio theory [T3] and for denoising of the empirical correlation 
matrices [TS] • 



In the financial context one considers a portfolio P consisting of a num- 
ber of securities Xs,s = 1,...,N associated with weights Wg that reflect 
the fraction of the total capital invested in a particular security. On the 
time scale At the return of such a portfolio at the tj instant of time is the 
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weighted sum 

N 

Gp{j,At)=Y,WsgsU,At) (2) 

s=l 

of logarithmic price increments 

gsij,At) = lnpsitj + At)- In psitj) s = I, ...N; j = 1, ...,T (3) 

of individual securities Xg. Each such return can be considered a product 

9sij) = sign^ij) X Vsij) (4) 

of its sign and of its magnitude Vs{j) which measures the volatility. 

These time series can be used to create an x T data matrix M and 
then a correlation matrix C according to the formula 

C = (l/r)MM'^. (5) 

Each element of C is thus the Pearson correlation coefficient Cmn between 
a pair of signals m and n. By solving the eigenvalue problem 

Cxj = AjXj, i = l,...,N, (6) 

this matrix can be transformed to the diagonal form. From the point of view 
of investment theories, each eigenvector Xj can be considered as a realiza- 
tion of an A^-security portfolio Pi with the weights equal to the eigenvector 

(k) 

components x\ ,k = 1,...,N. For a non-degenerate matrix C, Pi and Pj 
are independent for each pair of their indices, which allows one to choose 
such a portfolio, whose risk is independent of others. According to the clas- 
sical theory |13j . the risk R{P) = = vav{Gp{j)}J^i for the relevant 
group of securities can be related to correlations (or covariances) between 
the time series of individual security returns gsij),j = Ij ■■■,T. 

Each eigenvector determined by Eq. ([6]) (and thus portfolio) can be 
associated with the corresponding time series of the portfolio's returns by 
the expression 

N 

Zi{j,At) = Y,xf\k{j,At), i = l,...,N; j = l,...,T, (7) 

k=l 

which is analogous to Eq. ([2]). These principal value time series we shall 
call the eigensignals Zi (see also [3 [16] for some alternative realizations). 
The risk associated with such eigensignals is related with the corresponding 
eigenvalues: 

R{n) = a\Zi) = xfCx, = A,. (8) 

Thus, the eigenvalue size is a risk measure and, in consequence, the larger 
Xi, the larger variance of Zi and also the larger risk of the corresponding 
portfolio Pi. 
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3. Data specification 

This study of inter-stock correlations is based on higli-frequency data 
from the American stock market [17J in the period 1 Dec 1997 — 31 Dec 
1999. We chose a set of stocks of A'' = 100 highly capitalized companies 
listed in NYSE or NASDAQ (capitalization > $10^° in each case) . These 
stocks are sufficiently frequently traded (0.01-1 transactions/s) so that the 
time scale of Ai = 5 min allows to perform a statistically significant analy- 
sis. For such a time scale the length of the time series exceeds 40,000 data 
points. From the perspective of our present purpose this time scale and 
the corresponding length of the series turn out to constitute a reasonable 
compromise. Thinking in terms of the Epps effect [H], in the liquid contem- 
porary markets the time horizon of At = 5 min is long enough so that the 
cross-correlations get sufficiently expressed beyond the noise level [20] . 
In case of the data considered here the time horizon at which Ai saturates 
at its maximum corresponds to about 30 min, while for At = 5 min Ai as- 
sumes approx. 2/3 of its saturation level. The length of the time series, on 
the other hand, allows one to study the T dependence of cross-correlations 
in a relatively broad range of time intervals up to the maximum which 
corresponds to Q = 406. 

4. Data analysis 

One natural characteristics that may offer some introductory insight 
when relating a given correlation matrix to the RMT is the distribution 
of matrix elements. For our correlation matrices two such distributions 
corresponding to the full Q = 406 and to Q = 3, which in this latter case 
is obtained by properly windowing the same time series and averaging over 
the windows, versus the best Gaussian fits, are shown in Figure 1. Both 
these distributions are shifted more towards positive values. In the case of 
Q = 406 the distribution is naturally much narrower than for Q = 3 and 
shows essentially no presence of negative matrix elements. This signals that 
the real correlations are less contaminated by the measurement noise for 
Q = 406. Also the Gaussian fit in this case is less satisfactory, especially 
in the region of larger positive values of Cmn- Here, on the level of 1% 
probability one finds deviations of about 8a (mean standard deviation) while 
for Q = 3 analogous deviations reach at most 3a. 

5. Eigenvalue distribution 

A complementary and an even more informative characteristics of the 
matrix is its eigenvalue distribution. Figures 2(a) and 2(b) show all 100 
eigenvalues distributed along the horizontal axis, denoted by vertical lines. 
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Fig. 1 . Probability density distribution of entries of the empirical correlation ma- 
trices C calculated for 100 highly capitalized American companies over the period 
1998-1999; the solid line corresponds to Q = 406 and the dashed line to Q = 3. 
The corresponding best Gaussian fits are indicated by the dotted lines. 



for the above presented cases of Q = 406 and (5 = 3, respectively. The 
largest eigenvalue Ai, assuming very similar values (~ 18—19), repelled from 
the rest of the spectrum, is seen in both cases and describes the collective 
eigenstate which can be identified with the market. The rest of the spectra 
develop however a significantly different structure. For Q = 3 this rest 
covers a much wider range of values but at the same time its overlap with 
the corresponding random Wishart matrices region (shaded vertical), whose 
bounds are prescribed by the Eq. ([T|), is very substantial (87%) while for 
Q = 406 it is rudimentary and looks pure coincidence. Of course, concerning 
agreement of the empirical spectra with the RMT this case oi Q = 406 is 
much more meaningful as compared to Q = 3. One more interesting, and 
probably related effect, is that for Q = 406 one sees (Fig. 2(a)) the second 
A2, and even the third A3, eigenvalues that also are clearly separated from 
the bulk. These eigenvalues can be related to some branch-specific factors. 
No such factors can directly be seen for Q = 3 (Fig. 2(b)). 

Due to the matrix trace conservation (here TrC = 100), the existence 
of strong collective components can effectively supress the noisy part of the 
C eigenspectrum, shifting smaller eigenvalues towards zero and thus may 
distort their relation to the RMT case. In order to correct for this effects, 
which more is affecting the case of Figure 2(a), it is recommended to remove 
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the market factor Z\ from the data ^21j. One way to do this is by means 
of the least square fitting of this factor represented by z\ (j) to each of the 
original stock signals gk{j)'- 

9kij)=ak + PkZiij) + e^^\j), (9) 

where aj,/3j are parameters, and then one can construct a new correlation 
matrix C^^-' from the residuals e^^^O) (e.g. ref. [El [21]). After this is per- 
formed significantly more eigenvalues for Q = 406 fall within the shaded 
RMT region as Figure 2(c) illustrates. For Q = 3 such a removal does not 
affect so much the bulk of the original spectrum as Figure 2(d) compared 
to 2(b) shows. Such a removal can be executed once again and the A2 com- 
ponents can also be removed leading to the eigenspectra presented in Figs 
2(e) and 2(f). The effect of this second removal is already much smaller but 
is more noticeable in the former case. In the corresponding Figure 2(e) one 
still finds only (7 = 49%) eigenvalues overlapping with the RMT interval 
< Arnin, Amax >. This is almost a factor of two less than the case of Q = 3 
in Figure 2(f) and only this latter result remains in agreement with results 
presented earlier in [9l [21] (based on the daily data but with similar small 
values of Q) where a vast majority of the eigenvalues was within the RMT 
bounds. 



6. Auxiliary tests 

There is potentially one effect that may partly be responsible for such a 
sizeable disagreement between the Q = 406 empirical and the correspond- 
ing RMT results. There namely exists some time correlations - especially 
the volatility correlations - in the individual empirical time series that may 
effectively reduce the number of independent events in each series. If this is 
the case then the parameter Q used in the reference RMT formula should 
proportionally be smaller, the RMT bounds wider and thus an agreement 
improved. In order to verify to what extent such an effect may here be 
present we perform the following exercise. Imagine all the time series are 
progressing along the independent circles each, such that the end of the 
series is connected to its starting point. The circles are then rotated against 
each other by a random angle. This procedure preserves the internal corre- 
lations within each series but destroys the cross-correlations. The spectrum 
of eigenvalues of the so-randomized empirical correlation matrix is shown in 
Figure 3(a). The perfect coincidence between this empirical and the RMT 
result can be seen. This provides a strong evidence that the corresponding 
disagreement in Figs. 2 is entirely due to the real cross-correlations and is 
fully informative. 
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Fig. 2. Empirical eigenvalue spectrum of the correlation matrix C (vertical lines), 
calculated for 100 highly capitalized American companies over the period 1998- 
1999 for Q = 406 (a) and for Q = 3 (b); the eigenvalues of a random Wishart 
matrix with the same Q may lie only within the shaded vertical region. Eigenvalue 
spectrum after effective rank reduction of C, i.e. after subtracting the contribution 
of the most collective eigensignal Zi for Q = 406 (c) and for Q = 3 (d), and of the 
two most collective ones Zi and Z2 (c) for Q = 406 (e) and Q — 3 (f). 



As another related test the above circles are randomly rotated but this 
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(a) 



irbitmrily rotated returns 



(b) 



2 = 406 
daily rotated returns 



Fig. 3. Eigenvalue spectrum of the test correlation matrices for Q = 406 obtained 
after an unrestricted random shift (see text) of all the original empirical time series 
against each other (a) and after restricting this random shift to the multiples of 
one full trading day (b). Shaded regions correspond to RMT predictions for the 
same value of Q. 



(a) 



1 2 
X. 



2 = 406 



preserved volatility 
randomized sign 



(b) 



2 = 406 

randomized volatility 
preserved sign 



1 2 
X. 



Fig. 4. Eigenvalue spectrum of the test correlation matrices for Q — 406 obtained 
after the signs (Eq. ^) are randomly reshuffled, independently in each empirical 
time series and the magnitudes are left unchanged (a), and after the magntudes 
are randomly reshuffled and the signs unchanged (b). Shaded regions correspond 
to RMT predictions for the same value of Q. 



time the rotation angles are restricted to the multiples of one full trading day 
(daily rotated). Now, as is shown in Figure 3(b), the empirical spectrum 
broadens more than a factor of two relative to the previous case and of 
course by the same factor relative to the RMT bounds. This result may 
reflect the presence of day-to-day repeatable intraday patterns of activity 
that affect various different securities at similar instants of time during the 
day. 

As a further examination of the character of the stock market cross- 
correlations two more types of artificial series based on the original empir- 
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Fig. 5. Eigenvalue spectrum of the test correlation matrices for Q = 406 obtained 
from the time series of signs (a) and from the time series of moduli of the empir- 
ical returns (b) as decomposed by Eq. ([4]). Shaded regions correspond to RMT 
predictions for the same value of Q. 



ical data are created using the decomposition as in Eq. Before the 

correlation matrix is calculated either (a) the signs (sign3(j)) are randomly 
reshuffled but the return magnitudes Vs{j) left at their original places or 
vice versa (b), the signs are left original but Vs{j) reshuffled for each se- 
ries independently. The resulting spectra are shown in Figure 4(a) and 
4(b) correspondingly. From this perspective the signs turn out responsible 
much more for the cross-correlations than the corresponding magnitudes of 
the returns. As is clearly seen, randomizing signs washes out the cross- 
correlations almost completely (though deviations relative to the RMT still 
remain) while randomizing Vs{j) with the signs unaltered largely preserves 
the original (Fig. 2(a)) structure of the spectrum. To a good approximation 
the spectrum of Fig. 4(b) looks compressed by a factor of about two relative 
to that in Fig. 2(a). 

As a supplementary material to this kind of the test analysis in Figure 
5 we show the spectra (a) of the correlations matrices calculated from the 
time series of sign^(j) and (b) from the time series of Vs{j), independently. 
Consistently with the observation made in Figure 4(b) the time series of the 
empirical return's signs show very similar structure of cross-correlations as 
the full original result (Figure 2(a)). In view of the result presented in Figure 
4(a) somewhat surprising may however be considered the fact that the top 
eigenvalues appear (Figure 5(b)) even bit larger in the second case of Vs{j) 
time series. Relevant here is that these volatility related cross-correlations 
manifest their presence only when the return's signs are entirely discarded, 
i.e., their moduli are taken, which is a nonlinear operation. The correlation 
matrix detects the linear (cross-) correlations, but detecting linear corre- 
lations in volatility means detecting the nonlinear correlations in returns. 
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Fig. 6. Time series of the eigensignals for Ai (top) and A52 (bottom). Note different 
scales in vertical axes of both panels. 

Thus the above results taken together also point to the complex nonlinear 
character of the financial cross-correlations. 

7. Eigensignal properties 

A deeper exploration of the relation between the characteristics of the 
empirical financial cross-correlations and those of the conventional RMT 
needs to involve also the eigensignals since they directly reflect the dynam- 
ics of the corresponding portfolio. Figure 6 presents the time series of the 
eigensignal returns zi{j) calculated according to Eq. ^ for the most collec- 
tive eigenstate associated with Ai and for another one associated with A52. 
Even though this latter case corresponds to the middle of the empirical 
spectrum, strongly overlapping with the RMT region, it appears difficult 
to detect any significant differences, if one compares both series visually, 
ignoring different scales in vertical axis. Both eigensignals are nonstation- 
ary with likely extreme fluctuations and both of them also exhibit volatility 
clustering. A compact form to quantify the related effects is in terms of the 
multifractal spectra. It is a well established fact that stock returns form 
signals which are multifractal both on daily and on high-frequency time 
scales [23 [Ml [MIES]. 

In order to evaluate the singularity spectra f{a) we use the Multi- 
fractal Detrended Fluctuation Analysis (MFDFA) [26j method which for 
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the present purpose appears p7] more stable than the Wavelet Transform 
Modulus Maxima (WTMM) method [28]. Accordingly, we start from our 
eigensignal i represented by the time series Zi{j) of length N^^g and evaluate 
the signal profile 

j 

yu) = E(^^(^)- < >)' 3 = i,.:,Nes, (10) 
fe=i 

where < ... > denotes averaging over Zi{k). In the next step Y is divided 
into segments of length n {n < N^s) starting from both the beginning 
and the end of the time series so that eventually there are 2Mf>s segments. In 
each segment v a local trend is removed by fitting an order polynomial 
Pu'^ to the data. Then, after calculating the variance 

F\v, n) = - jZim^ - 1)" + ^] - Pl)\k)f (11) 

and averaging it over i/'s, we get the gth order fluctuation function 

i^.(n) = {^'E[^'('^'^)]'^'}'^'' ^eR (12) 

for all values of n. For a signal of the fractal character Fq{n) obeys a power- 
law functional dependence on n: 

Fg(n)~n'^(''), (13) 

at least for some range of n. If this is the case the MF-DFA procedure 
provides a family of generalized Hurst exponents /i((?), which form a de- 
creasing function of q for a multifractal signal or are independent of q for 
a monofractal one. A compact form to present the result graphically is to 
calculate the singularity spectrum /(a) through the relations: 

a = h{q) + qh'{q) f{a) = q[a - h{q)] + 1. (14) 

Some representative final results of such an analysis are shown in Figure 
7. Both eigensignals presented in Figure 6 {Zi and Z52) develop convincing 
multifractality with the spectrum f{a) of about the same width even though 
the later one represents the middle of the eigenvalue spectrum. The maxima 
of these /(a) spectra are however located at different positions, even relative 
to a = 0.5, which may reflect either persistency or antipersistency in the 
underlying time series. As far as the width of /(a) is concerned they are of 
comparable magnitude for all other eigensignals. As a global documentation 
of this fact the average over all (i = 2, 100) the corresponding singularity 
spectra is also shown in this Figure. This average displays maximum at 
a = 0.5 exactly. 
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Fig. 7. (a) Singularity spectra /(a) for the eigensignal corresponding to the largest 
eigenvalue Ai (solid line), to the average over all other eigensignals Zi, i = 2, 100 
(dashed line), and to (dotted line) of the empirical correlation matrix. 



The results presented in this contribution provide further evidence that 
the financial markets constitute a real complexity. The stock market cross- 
correlations viewed through the eigenspectrum of the correlation matrix 
show existence of the market linear collective component represented by 
one pronounced eigenvalue which is well separated from the bulk of eigen- 
values. This 'bulk' is however not of the Wishart random matrix ensemble 
type which is especially clearly seen when the time series used to construct 
the correlation matrix are sufficiently long. The fact that the financial cross- 
correlations appear not to be simply decomposable into 'market', 'sectors', 
and an uncorrelated Wishart 'bulk' has to do with their nonlinear character 
both in space and in time. This profound nonlinearity manifests itself in 
the multifractal nature of all the principal components (eigensignals) which 
represent different portfolios and in the volatility cross-correlations. This 
signals that information about the whole market is encoded already in all 
its constituents. This does not necessarily mean that the involved whole 
amount of information is of practical interest or importance. In order how- 
ever to disentangle - in the spirit of the Random Matrix Theory - what is 
more relevant from what is less, a more extended variant of random matrix 
ensemble is called for. In view of the results presented above, when postu- 
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lating an appropriate RMT variant to be used as a reference in the financial 
context one definitely needs to redefine the notion of noise such that some 
of the correlations are already built into. 
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